LTAG-spinal treebank and parser for Hindi
نویسندگان
چکیده
Statistical parsers need huge annotated treebanks to learn from and building treebanks is an expensive proposition. To create parsers for different grammar formalisms in a language, building separate treebanks for each of those isn’t a feasible task. Treebanks available in one formalism can be converted into an other either automatically or with minimal human effort by exploiting the similarities and differences between the two. In this work, we present an approach to extract an LTAGspinal treebank from Hyderabad Dependency Treebank for Hindi. LTAG-spinal is a variant of Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. A bidirectional LTAG dependency parser is trained on the extracted treebank and an LTAG dependency accuracy of 80.86% is reported.
منابع مشابه
LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing
Abstract. We introduce LTAG-spinal, a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. Unlike in traditional LTAG, subcategorization frames and the argument-adjunct distinction are left underspecified in LTAG-spinal. LTAG-spinal with adjunction constraints is weakly equivalent to LTAG. The LTAG-spinal for...
متن کاملStatistical Ltag Parsing
STATISTICAL LTAG PARSING Libin Shen Aravind K. Joshi In this work, we apply statistical learning algorithms to Lexicalized Tree Adjoining Grammar (LTAG) parsing, as an effort toward statistical analysis over deep structures. LTAG parsing is a well known hard problem. Statistical methods successfully applied to LTAG parsing could also be used in many other structure prediction problems in NLP. F...
متن کاملStatistical Morphological Tagging and Parsing of Korean with an LTAG Grammar
This paper describes a lexicalized tree adjoining grammar (LTAG) based parsing system for Korean which combines corpus-based morphological analysis and tagging with a statistical parser. Part of the challenge of statistical parsing for Korean comes from the fact that Korean has free word order and a complex morphological system. The parser uses an LTAG grammar which is automatically extracted u...
متن کاملExploration of the LTAG-Spinal Formalism and Treebank for Semantic Role Labeling
LTAG-spinal is a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) introduced by (Shen, 2006). The LTAG-spinal Treebank (Shen et al., 2008) combines elementary trees extracted from the Penn Treebank with Propbank annotation. In this paper, we present a semantic role labeling (SRL) system based on this new resource and provide an experimental comparison with CCGBank and a st...
متن کاملBidirectional Dependency Parser for Hindi, Telugu and Bangla
This paper describes the dependency parser we used in the NLP Tools Contest, 2009 for parsing Hindi, Bangla and Telugu. The parser uses a bidirectional parsing algorithm with two operations proj and non-proj to build the dependency tree. The parser obtained Labeled Attachment Score of 71.63%, 59.86% and 67.74% for Hindi, Telugu and Bangla respectively on the treebank with fine-grained dependenc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009